Intro to python - Basics - 1
One should look for what is and not what he thinks should be. (Albert Einstein)
Basics: Topic introduction
In this part of the course, we will cover the following concepts:
- Data Science industry overview
- Python as a programming language and the tools used to write and execute Python code
- Basic operations and data types
Module completion checklist
|
Objective
|
Complete
|
|
Discuss how programming is used across industries and define core functions of data scientists
|
|
|
Explain the data science life cycle and ways to use predictive modeling
|
|
|
Summarize data science use cases for Python
|
|
Why are we learning to program?
I’m a […] major, why do I need to learn programming?
![centered]()
- Programming is
becoming a more universal skill like typing in Word, or making eye-catching presentations in PowerPoint in the 90’s or early 2000’s
- Programming facilitates performing the same operations on a large scale multiple times
- Programming is a necessary component of reproducible research
- Programming makes you think through your problem from a very specific viewpoint that requires clear formulation of your end goal and methods you would like to utilize
- The list goes on …
What level of proficiency do I need?
To use programming as a tool in your professional toolkit, you don’t need to be a computer scientist or have a similar level of knowledge as one
The level of proficiency will depend on
the problems you are trying to solve on daily basis
- the subject matter area you are in
- the level of sophistication of the solutions you would like to implement
Most of the time, people who are subject matter experts who also use various programming tools and languages are known as data analysts or data scientists
What are the problems you are trying to solve? What is your area of expertise? What level of complexity would you like your programmatic solution to have?
A data scientist can
Pose the right question
Wrangle the data (gather, clean, and sample data to get a suitable dataset)
Manage the data for easy access by the organization
Explore the data to generate a hypothesis
Make predictions using statistical methods such as regression and classification
Communicate the results using visualizations, presentations, and products
![centered-border]()
What do data scientists do?
Use programming languages and tools to
Wrangle the data (gather, clean, and sample data to get a suitable dataset)
Manage the data for easy access by the organization
Explore the data to generate a hypothesis
Make predictions using statistical methods such as regression and classification
Stemming from the list above, the programming skills should cover knowing a programming language (or two, or three, or …) to a degree that allows you to perform these operations!
Module completion checklist
|
Objective
|
Complete
|
|
Discuss how programming is used across industries and define core functions of data scientists
|
✔
|
|
Explain the data science life cycle and ways to use predictive modeling
|
|
|
Summarize data science use cases for Python
|
|
Data science control cycle: framework for data
There is a protocol or standard for working with data that most data scientists follow
- The cycle involves everything from asking the right questions and being knowledgeable about the data you’re studying, to optimizing your model’s performance
Data Science Control Cycle (DSCC)
![centered]()
Question - Which part of the cycle do you think takes up the most time?
- Data Cleaning and Collection
- Data Analyzing and Modelling
- Learning New Techniques
How you think data scientists spend their time?
![centered]()
How data scientists actually spend their time
![centered]()
DSCC: SMART questions
SPECIFIC
- How are you framing the question?
- What specific variables?
MEASURABLE
- What metrics are you using?
- What is the success criteria?
ACHIEVABLE
- Scope your analysis well
- Use data that is available to you
RELEVANT
- Who will use this analysis?
- Is it interesting or usable?
TIMEBOUND
- Reference time frame of analysis
- If predicting, in next year? next month? ever?
DSCC: research
- Data is key to quality results
- Garbage in - garbage out is the famous programming mantra that stands true for data science as well!
- It should always be on your mind when working with any dataset
- Whether it is
suitability of the data for your research or its quality, it must never be overlooked
DSCC: modeling
- Model by definition is a replica of a real thing
- Select a
model that suits your problem/data or simulates the real-life situation in the closest possible way
![centered]()
DSCC: steps 4 - 5
![centered]()
- We all have
prior knowledge that sometimes makes us pre-conditioned to make incorrect assumptions
- Don’t let your extensive experience get in the way, always start as if you know nothing about the problem!
![centered]()
- Validate and test your assumptions before delivering results to stakeholders!
- Adjust the model if necessary
- Have you ever made incorrect assumptions about a problem you were trying to solve?
DSCC: step 6
- Interpretation of the results is as important as the results themselves
- Use your best judgment and expertise to deliver
actual information that the data carries to stakeholders
- Make your conclusions actionable, so that stakeholders know what next steps to take from looking at your results
Example: campaign response rate
![centered-border]()
Predictive Modeling: When is it not useful?
- Predictive Modeling is not useful if you have an unlimited campaign budget and can target everyone
- However, a predictive model provides
better targeting information helping reduce the campaign budget
Predictive modeling: High Level Overview
![centered]()
Module completion checklist
|
Objective
|
Complete
|
|
Discuss how programming is used across industries and define core functions of data scientists
|
✔
|
|
Explain the data science life cycle and ways to use predictive modeling
|
✔
|
|
Summarize data science use cases for Python
|
|
What can you do with Python?
Natual Language Processing
- Sentiment analysis
- Twitter analysis using live Twitter feeds
Deep Learning
- Object recognition
- Facial recognition
Visualization
- Interactive visualization deployable to websites
- 3D visualizations
Automation, Big Data, and Predictive Modeling
- Web scraping to automate the collection of data from websites
- Data wrangling with fast and efficient functions
- Big Data modeling through integration with Apache Spark
- Machine learning on structured data
- Data ingestion from various sources
Knowledge check
Link: kc params$basics_knowledge_check_1
Module completion checklist
|
Objective
|
Complete
|
|
Discuss how programming is used across industries and define core functions of data scientists
|
✔
|
|
Explain the data science life cycle and ways to use predictive modeling
|
✔
|
|
Summarize data science use cases for Python
|
✔
|
Congratulations on completing this module!
![icon-left-bottom]()